智能论文笔记

Supervised PCA: A Multiobjective Approach

Alexander Ritchie , Laura Balzano , Daniel Kessler , Chandra S. Sripada , Clayton Scott

分类： (统计)机器学习 | 机器学习

2020-11-10

监督主体组件分析（SPCA）的方法旨在将标签信息纳入主成分分析（PCA），以便提取的功能对于预测感兴趣的任务更有用。SPCA的先前工作主要集中在优化预测误差上，并忽略了提取功能解释的最大化方差的价值。我们为SPCA提出了一种新的方法，该方法共同解决了这两个目标，并从经验上证明我们的方法主导了现有方法，即在预测误差和变异方面都超越了它们的表现。我们的方法可容纳任意监督的学习损失，并通过统计重新制定提供了广义线性模型的新型低级扩展。

translated by 谷歌翻译

NBC-Softmax : Darkweb Author fingerprinting and migration tracking

Gayan K. Kulatilleke , Shekhar S. Chandra , Marius Portmann

分类：机器学习 | 人工智能 | 自然语言处理

2022-12-15

Metric learning aims to learn distances from the data, which enhances the performance of similarity-based algorithms. An author style detection task is a metric learning problem, where learning style features with small intra-class variations and larger inter-class differences is of great importance to achieve better performance. Recently, metric learning based on softmax loss has been used successfully for style detection. While softmax loss can produce separable representations, its discriminative power is relatively poor. In this work, we propose NBC-Softmax, a contrastive loss based clustering technique for softmax loss, which is more intuitive and able to achieve superior performance. Our technique meets the criterion for larger number of samples, thus achieving block contrastiveness, which is proven to outperform pair-wise losses. It uses mini-batch sampling effectively and is scalable. Experiments on 4 darkweb social forums, with NBCSAuthor that uses the proposed NBC-Softmax for author and sybil detection, shows that our negative block contrastive approach constantly outperforms state-of-the-art methods using the same network architecture. Our code is publicly available at : https://github.com/gayanku/NBC-Softmax

translated by 谷歌翻译

Automated anomaly-aware 3D segmentation of bones and cartilages in knee MR images from the Osteoarthritis Initiative

Boyeong Woo , Craig Engstrom , William Baresic , Jurgen Fripp , Stuart Crozier , Shekhar S. Chandra

分类：计算机视觉 | 机器学习

2022-11-30

In medical image analysis, automated segmentation of multi-component anatomical structures, which often have a spectrum of potential anomalies and pathologies, is a challenging task. In this work, we develop a multi-step approach using U-Net-based neural networks to initially detect anomalies (bone marrow lesions, bone cysts) in the distal femur, proximal tibia and patella from 3D magnetic resonance (MR) images of the knee in individuals with varying grades of osteoarthritis. Subsequently, the extracted data are used for downstream tasks involving semantic segmentation of individual bone and cartilage volumes as well as bone anomalies. For anomaly detection, the U-Net-based models were developed to reconstruct the bone profiles of the femur and tibia in images via inpainting so anomalous bone regions could be replaced with close to normal appearances. The reconstruction error was used to detect bone anomalies. A second anomaly-aware network, which was compared to anomaly-na\"ive segmentation networks, was used to provide a final automated segmentation of the femoral, tibial and patellar bones and cartilages from the knee MR images containing a spectrum of bone anomalies. The anomaly-aware segmentation approach provided up to 58% reduction in Hausdorff distances for bone segmentations compared to the results from the anomaly-na\"ive segmentation networks. In addition, the anomaly-aware networks were able to detect bone lesions in the MR images with greater sensitivity and specificity (area under the receiver operating characteristic curve [AUC] up to 0.896) compared to the anomaly-na\"ive segmentation networks (AUC up to 0.874).

translated by 谷歌翻译

Efficient block contrastive learning via parameter-free meta-node approximation

Gayan K. Kulatilleke , Marius Portmann , Shekhar S. Chandra

分类：机器学习

2022-09-28

对比学习最近在包括图形在内的许多领域取得了巨大的成功。然而，对比损失，尤其是对于图形，需要大量的负样本，这些样本是不可计算的，并且在二次时复杂性具有计算性过高。子采样不是最佳和不正确的负抽样导致采样偏差。在这项工作中，我们提出了一种基于元节点的近似技术，该技术可以（a）代理二次群集大小的时间复杂性中的所有负组合（b），（c）在图级别，而不是节点级别，（d）利用图形稀疏性。通过用添加群集对替换节点对，我们在图表级别计算群集时间的负fertiations。最终的代理近似元节点对比度（PAMC）损失基于简单优化的GPU操作，可捕获完整的负面因素，但具有线性时间复杂性，但具有有效的效率。通过避免采样，我们有效地消除了样本偏差。我们符合大量样品的标准，从而实现了块对比度，这被证明超过了成对的损失。我们使用学习的软群集分配进行元节点收缩，并避免在边缘创建过程中添加可能的异质和噪声。从理论上讲，我们表明现实世界图表很容易满足我们近似所需的条件。从经验上讲，我们在6个基准测试上表现出对最先进的图形聚类的有希望的准确性。重要的是，我们在效率方面获得了可观的收益。训练时间最多可达3倍，推理时间为1.8倍，减少GPU记忆的时间超过5倍。

translated by 谷歌翻译

Skin Lesion Recognition with Class-Hierarchy Regularized Hyperbolic Embeddings

Zhen Yu , Toan Nguyen , Yaniv Gal , Lie Ju , Shekhar S. Chandra , Lei Zhang , Paul Bonnington , Victoria Mar , Zhiyong Wang , Zongyuan Ge

分类：计算机视觉

2022-09-13

实际上，许多医疗数据集在疾病标签空间上定义了基本的分类学。但是，现有的医学诊断分类算法通常假定具有语义独立的标签。在这项研究中，我们旨在利用深度学习算法来利用类层次结构，以更准确，可靠的皮肤病变识别。我们提出了一个双曲线网络，以共同学习图像嵌入和类原型。事实证明，双曲线为与欧几里得几何形状更好地建模层次关系提供了一个空间。同时，我们使用从类层次结构编码的距离矩阵限制双曲线原型的分布。因此，学习的原型保留了嵌入空间中的语义类关系，我们可以通过将图像特征分配给最近的双曲线类原型来预测图像的标签。我们使用内部皮肤病变数据集，该数据集由65种皮肤疾病的大约230k皮肤镜图像组成，以验证我们的方法。广泛的实验提供了证据表明，与模型相比，我们的模型可以实现更高的准确性，而在不考虑班级关系的情况下可以实现更高的严重分类错误。

translated by 谷歌翻译

Bias Challenges in Counterfactual Data Augmentation

S Chandra Mouli , Yangze Zhou , Bruno Ribeiro

分类：机器学习 | (统计)机器学习

2022-09-12

深度学习模型往往不是由于依赖虚假特征来解决任务的依赖而不是分布的。反事实数据增强提供了一种（大约）实现伪造特征反事实的表示形式的一般方法，这是对分布（OOD）鲁棒性的要求。在这项工作中，我们表明，如果增强功能是由{\ em上下文估计机器}执行的，则反事实数据扩展可能无法实现所需的反事实不变性。我们从理论上分析了这种反事实数据增强所施加的不变性，并描述了一个示例性NLP任务，在这种情况下，通过上下文猜测机器的反事实数据增强并不会导致强大的OOD分类器。

translated by 谷歌翻译

Selective Inference for Sparse Multitask Regression with Applications in Neuroimaging

Snigdha Panigrahi , Natasha Stewart , Chandra Sekhar Sripada , Elizaveta Levina

分类： (统计)机器学习

2022-05-27

多任务学习经常用于对一组相同功能集的一组相关响应变量进行建模，从而相对于分别处理每个响应变量的方法提高了预测性能和建模精度。尽管多任务学习的潜力比单任务替代方案具有更强大的推理，但该领域的先前工作在很大程度上忽略了不确定性量化。我们在本文中的重点是神经影像学中常见的多任务问题，其目标是了解多个认知任务分数（或其他主题级评估）与从成像收集的脑连接数据之间的关系。我们提出了一个选择性推断以解决此问题的框架，并具有以下灵活性：（i）通过稀疏性惩罚共同确定每个任务的相关协变量，（ii）基于估计的稀疏性在模型中进行有效推理结构体。我们的框架为推理提供了新的有条件过程，基于选择事件的改进，该事件产生了可拖延的选择调整后的可能性。这给出了最大似然推理的估计方程式的近似系统，可通过单个凸优化问题解决，并使我们能够在大约正确的覆盖范围内有效地形成置信区间。我们的选择性推理方法应用于青少年认知大脑发展（ABCD）研究的模拟数据和数据，比常用的替代方案（例如数据拆分）产生了更紧密的置信区间。我们还通过模拟证明，与单任务方法相比，具有选择性推理的多任务学习可以更准确地恢复真实信号。

translated by 谷歌翻译

Transformer Compressed Sensing via Global Image Tokens

Marlon Bran Lorenzana , Craig Engstrom , Shekhar S. Chandra

分类：计算机视觉 | 机器学习

2022-03-24

与传统的手工制作方法相比，卷积神经网络（CNN）表现出出色的压缩感测（CS）性能。但是，它们在通用性，归纳偏见和难以建模长距离关系方面受到了广泛的限制。变压器神经网络（TNN）通过实施旨在捕获输入之间依赖性的注意机制来克服此类问题。但是，高分辨率任务通常需要视觉变压器（VIT）将图像分解为基于贴片的令牌，将输入限制为固有的本地环境。我们提出了一种新型的图像分解，将图像自然嵌入到低分辨率输入中。这些万花筒令牌（KD）以与基于贴片的方法相同的计算成本提供了一种全球关注的机制。为了展示这一发展，我们用TNN块替换了众所周知的CS-MRI神经网络中的CNN组件，并证明了KD提供的改进。我们还提出了图像令牌的合奏，从而提高了整体图像质量并降低了模型大小。提供补充材料：https：//github.com/uqmarlonbran/tcs.git

translated by 谷歌翻译

Generalisable 3D Fabric Architecture for Streamlined Universal Multi-Dataset Medical Image Segmentation

Siyu Liu , Wei Dai , Craig Engstrom , Jurgen Fripp , Stuart Crozier , Jason A. Dowling , Shekhar S. Chandra

分类：计算机视觉

2020-06-28

Data scarcity is common in deep learning models for medical image segmentation. Previous works proposed multi-dataset learning, either simultaneously or via transfer learning to expand training sets. However, medical image datasets have diverse-sized images and features, and developing a model simultaneously for multiple datasets is challenging. This work proposes Fabric Image Representation Encoding Network (FIRENet), a universal architecture for simultaneous multi-dataset segmentation and transfer learning involving arbitrary numbers of dataset(s). To handle different-sized image and feature, a 3D fabric module is used to encapsulate many multi-scale sub-architectures. An optimal combination of these sub-architectures can be implicitly learnt to best suit the target dataset(s). For diverse-scale feature extraction, a 3D extension of atrous spatial pyramid pooling (ASPP3D) is used in each fabric node for a fine-grained coverage of rich-scale image features. In the first experiment, FIRENet performed 3D universal bone segmentation of multiple musculoskeletal datasets of the human knee, shoulder and hip joints and exhibited excellent simultaneous multi-dataset segmentation performance. When tested for transfer learning, FIRENet further exhibited excellent single dataset performance (when pre-training on a prostate dataset), as well as significantly improved universal bone segmentation performance. The following experiment involves the simultaneous segmentation of the 10 Medical Segmentation Decathlon (MSD) challenge datasets. FIRENet demonstrated good multi-dataset segmentation results and inter-dataset adaptability of highly diverse image sizes. In both experiments, FIRENet's streamlined multi-dataset learning with one unified network that requires no hyper-parameter tuning.

translated by 谷歌翻译

Computing the Performance of A New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

James K. He , Sofía S. Villar , Lida Mavrogonatou

分类：机器学习

2023-01-03

Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.

translated by 谷歌翻译